fix(cli): add missing subprocess.run() timeouts in doctor and status#3693
Closed
dieutx wants to merge 1 commit intoNousResearch:mainfrom
Closed
fix(cli): add missing subprocess.run() timeouts in doctor and status#3693dieutx wants to merge 1 commit intoNousResearch:mainfrom
dieutx wants to merge 1 commit intoNousResearch:mainfrom
Conversation
Add timeout parameters to 4 subprocess.run() calls that could hang indefinitely if the child process blocks (e.g., unresponsive docker daemon, systemctl waiting for D-Bus): - doctor.py: docker info (timeout=10), ssh check (timeout=15) - status.py: systemctl is-active (timeout=5), launchctl list (timeout=5) Each call site now catches subprocess.TimeoutExpired and treats it as a failure, consistent with how non-zero return codes are already handled. Add AST-based regression test that verifies every subprocess.run() call in CLI modules specifies a timeout keyword argument.
dlkakbs
added a commit
to dlkakbs/hermes-agent
that referenced
this pull request
Mar 29, 2026
All subprocess.run() calls in hermes_cli/gateway.py lacked a timeout parameter. If systemctl, launchctl, loginctl, wmic, or ps blocks (e.g. D-Bus unavailable, WMI service stuck, launchd unresponsive), hermes gateway start/stop/restart/status/install/uninstall hangs indefinitely with no feedback to the user. Timeout values applied: - Lifecycle commands (start/stop/restart/enable/disable/daemon-reload, launchctl load/unload): timeout=30 - Status/query commands (is-active, loginctl show-user, launchctl list, systemctl status, journalctl, tail, ps aux, wmic): timeout=5-10 - loginctl enable-linger: timeout=10 For _is_service_running() and launchd_status(), TimeoutExpired is caught explicitly and treated as not-running, matching how non-zero return codes are already handled. All other call sites are either inside existing try/except Exception blocks (find_gateway_pids, _enable_systemd_linger, get_systemd_linger_status) or raise TimeoutExpired as a clear error instead of hanging forever. Same class of fix as NousResearch#3469 (context_references) and NousResearch#3693 (doctor/status).
Contributor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
4
subprocess.run()calls in CLI utilities have notimeoutparameter and can hang indefinitely if the child process blocks. The rest of the CLI already uses timeouts (clipboard.py: 3-15s, banner.py: 5-10s, doctor.py npm audit: 30s) — these 4 are the only ones missing.Same class of bug as #3469 (git/ripgrep subprocess calls hanging on large repos).
Root Cause
subprocess.run()defaults totimeout=None, meaning it waits forever. If the docker daemon is unresponsive, systemctl is waiting for D-Bus, or an SSH host is unreachable past its ConnectTimeout,hermes doctorandhermes statushang with no feedback.Fix
docker info: addtimeout=10ssh ... echo ok: addtimeout=15(slightly above SSH's own ConnectTimeout=5)systemctl --user is-active: addtimeout=5launchctl list: addtimeout=5Each call site catches
subprocess.TimeoutExpiredand treats it as failure, consistent with how non-zero return codes are already handled.Tests
1 new AST-based regression test in
tests/hermes_cli/test_subprocess_timeouts.py— parameterized across all 4 CLI modules that usesubprocess.run(). Parses each file and asserts every call has atimeoutkeyword. Catches future regressions automatically.11 pre-existing doctor/status tests + 4 new = 15 passed.